- 
        Couldn't load subscription status. 
- Fork 35
fix(backend): performance issues when generating artifacts #7320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| WalkthroughThis pull request introduces performance optimizations and API refinements across the artifact and schema management systems. Key changes include: (1) adding  Pre-merge checks and finishing touches❌ Failed checks (1 warning)
 ✅ Passed checks (2 passed)
 ✨ Finishing touches
 🧪 Generate unit tests (beta)
 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment  | 
| CodSpeed Performance ReportMerging #7320 will not alter performanceComparing  Summary
 Footnotes | 
72e7b83    to
    408ce30      
    Compare
  
    73e9d80    to
    9662b59      
    Compare
  
    | There's a follow up PR related to memory usage when generating artifacts: #7327 | 
207ffc8    to
    4bd0e78      
    Compare
  
    ea5302a    to
    9dcedfc      
    Compare
  
    9dcedfc    to
    dd81c80      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
backend/infrahub/core/schema/manager.py (1)
148-149: Consider documenting the side effect in get_schema_branch.The method now has a side effect of calling
set_schema_branchwhen the branch doesn't exist, which populates the hash mapping. While this ensures consistency, it changes the method from a pure getter to one with side effects.Consider adding a brief comment explaining this behavior:
def get_schema_branch(self, name: str) -> SchemaBranch: if name in self._branches: return self._branches[name] + # Create and register a new branch schema to ensure hash mapping exists self.set_schema_branch(name, schema=SchemaBranch(cache=self._cache, name=name)) return self._branches[name]backend/infrahub/artifacts/models.py (1)
28-29: Improve field descriptions and deprecation handling.Both
queryandquery_idhave identical descriptions, which is confusing. Additionally, the deprecation is only indicated via a comment.Consider these improvements:
- query: str = Field(..., description="The name of the query to use when collecting data") # Deprecated - query_id: str = Field(..., description="The name of the query to use when collecting data") + query: str = Field(..., description="(Deprecated: use query_id) The name of the query to use when collecting data", deprecated=True) + query_id: str = Field(..., description="The ID of the query to use when collecting data")Note: If using Pydantic v2, the
deprecatedparameter provides formal deprecation warnings. If on Pydantic v1, consider adding the deprecation notice in the description.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- poetry.lockis excluded by- !**/*.lock
📒 Files selected for processing (12)
- backend/infrahub/artifacts/models.py(1 hunks)
- backend/infrahub/computed_attribute/tasks.py(1 hunks)
- backend/infrahub/core/schema/manager.py(4 hunks)
- backend/infrahub/core/schema/schema_branch.py(1 hunks)
- backend/infrahub/git/integrator.py(1 hunks)
- backend/infrahub/git/models.py(1 hunks)
- backend/infrahub/git/repository.py(3 hunks)
- backend/infrahub/git/tasks.py(7 hunks)
- backend/infrahub/message_bus/types.py(1 hunks)
- backend/infrahub/proposed_change/tasks.py(3 hunks)
- backend/infrahub/workers/dependencies.py(2 hunks)
- pyproject.toml(1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
- backend/infrahub/core/schema/schema_branch.py
- backend/infrahub/computed_attribute/tasks.py
- backend/infrahub/message_bus/types.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/integrator.py
- backend/infrahub/git/models.py
- backend/infrahub/artifacts/models.py
- backend/infrahub/git/repository.py
- backend/infrahub/workers/dependencies.py
- backend/infrahub/git/tasks.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
- backend/infrahub/core/schema/schema_branch.py
- backend/infrahub/computed_attribute/tasks.py
- backend/infrahub/message_bus/types.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/integrator.py
- backend/infrahub/git/models.py
- backend/infrahub/artifacts/models.py
- backend/infrahub/git/repository.py
- backend/infrahub/workers/dependencies.py
- backend/infrahub/git/tasks.py
{pyproject.toml,poetry.lock}
📄 CodeRabbit inference engine (.github/instructions/tooling.instructions.md)
Use Poetry to manage the Python project and its dependencies
Files:
- pyproject.toml
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
Use Poetry to manage the Python project and dependencies
Files:
- pyproject.toml
🧬 Code graph analysis (6)
backend/infrahub/core/schema/schema_branch.py (2)
backend/infrahub/core/models.py (2)
nodes(61-63)
generics(66-68)backend/infrahub/core/schema/__init__.py (1)
get(65-72)
backend/infrahub/computed_attribute/tasks.py (1)
backend/infrahub/trigger/models.py (1)
name(79-80)
backend/infrahub/core/schema/manager.py (1)
backend/infrahub/core/schema/schema_branch.py (3)
get(303-329)
to_dict_api_schema_object(165-171)
get_hash(139-149)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
get(14-35)
backend/infrahub/workers/dependencies.py (1)
backend/infrahub/core/schema/manager.py (2)
get_branches(793-794)
get_sdk_schema_branch(151-159)
backend/infrahub/git/tasks.py (4)
backend/infrahub/types.py (1)
Any(312-319)backend/infrahub/core/schema/basenode_schema.py (2)
attribute_names(339-340)
relationship_names(343-344)backend/infrahub/workers/dependencies.py (1)
get_workflow(122-123)backend/infrahub/services/adapters/workflow/local.py (1)
submit_workflow(33-41)
🔇 Additional comments (18)
backend/infrahub/computed_attribute/tasks.py (1)
106-112: LGTM! Query ID transition looks correct.The change from
transform.query.peer.name.valuetotransform.query.idaligns with the broader migration to use query IDs instead of query names across the codebase.backend/infrahub/git/models.py (1)
41-42: LGTM! Proper deprecation pattern.The addition of
query_idwhile keeping the deprecatedqueryfield ensures backward compatibility during the migration period.backend/infrahub/message_bus/types.py (1)
92-93: LGTM! Consistent deprecation pattern.The addition of
query_idwhile deprecatingquery_namefollows the same pattern used elsewhere in the codebase, ensuring consistent migration.backend/infrahub/core/schema/schema_branch.py (1)
165-171: LGTM! API-friendly serialization method.The new
to_dict_api_schema_objectmethod provides an API-focused serialization that returns lists of dictionaries (viamodel_dump()), complementing the existingto_dict_schema_objectwhich returns schema objects. This aligns with the SDK-backed schema caching introduced in the manager.backend/infrahub/core/schema/manager.py (3)
46-47: LGTM! LRU cache addresses memory concerns.The LRU cache with
maxsize=10provides a good balance between performance and memory usage, addressing the concerns raised in previous review discussions about memory consumption with many branches. The hash-based keying ensures efficient cache invalidation when schemas change.
151-159: LGTM! Efficient SDK schema retrieval with hash-based caching.The method correctly uses the branch hash as the cache key, ensuring that identical schemas across different branches share the same cached SDK representation. The lazy initialization pattern (check cache, construct if missing) is appropriate.
783-784: LGTM! Proper cleanup in purge logic.The addition correctly removes hash mappings when branches are purged, preventing memory leaks and stale cache entries.
backend/infrahub/workers/dependencies.py (1)
38-44: LGTM! Pre-population of SDK schema cache.The initialization-time population of the client schema cache ensures that SDK representations are available immediately without triggering expensive on-the-fly conversions. The guard
if registry.schemasafely handles cases where the registry isn't fully initialized yet.backend/infrahub/git/repository.py (2)
254-259: LGTM! TTL cache aligns with Git Sync schedule.The 30-second TTL cache for repository initialization aligns well with the 1-minute Git Sync scheduled job mentioned in the PR description. The cache key correctly includes all parameters that affect repository state (repository_id, name, kind, commit).
272-282: LGTM! Clean separation of caching and Prefect task concerns.The wrapper pattern correctly separates the cached implementation (
_get_initialized_repo) from the Prefect task wrapper (get_initialized_repo), withcache_policy=NONEpreventing duplicate caching by Prefect. This ensures caching is handled exclusively by cachetools as intended.pyproject.toml (1)
79-79: No update required for cachetools-async version
The constraint^0.0.5already references the latest published version (0.0.5) on PyPI.Likely an incorrect or invalid review comment.
backend/infrahub/git/integrator.py (1)
1366-1366: LGTM! Consistent with query_id propagation.The change from
message.querytomessage.query_idaligns with the broader refactoring to use query identifiers throughout the artifact generation flow.backend/infrahub/artifacts/models.py (1)
33-33: LGTM! Context field addition is appropriate.Adding the
contextfield enables better context propagation through the artifact processing pipeline.backend/infrahub/proposed_change/tasks.py (1)
694-694: LGTM! Consistent query_id propagation throughout the flow.The changes correctly:
- Pass
query_idtoCheckArtifactCreate(line 694)- Fetch the query node's
idin the GraphQL query (line 1249)- Parse and assign the
query_idfrom the response (line 1471)The implementation is consistent with the broader query_id refactoring across the codebase.
Also applies to: 1249-1249, 1471-1471
backend/infrahub/git/tasks.py (4)
62-78: LGTM! Helper function correctly extracts parameter field names.The
_collect_parameter_first_segmentsfunction properly extracts the first segment of parameter keys (e.g.,"foo__bar"→"foo") by recursively walking through nested data structures.
81-126: LGTM! Efficient prefetching optimization.The
_prefetch_group_member_nodesfunction implements a smart optimization by:
- Grouping member IDs by kind to batch requests
- Filtering attributes and relationships to fetch only required fields
- Disabling ordering for better performance
- Using the SDK's batch execution for efficiency
This should significantly reduce both the number of API calls and the amount of data transferred when processing artifact definitions with many targets.
396-409: LGTM! Improved target fetching with prefetching optimization.The changes improve artifact generation by:
- Explicitly fetching the group with members using
client.get()- Extracting required parameter fields from the artifact definition
- Prefetching member node data upfront to avoid N+1 queries
This should significantly improve performance when generating artifacts for definitions with many targets.
441-480: LGTM! Batched workflow submissions improve throughput.The changes batch artifact generation workflow submissions, which shifts the bottleneck to the Prefect API servers (as discussed in previous reviews). This enables better scalability by allowing the Prefect infrastructure to be scaled independently.
The addition of
query_idon line 462 correctly propagates the query identifier through the artifact generation flow.
When generating a lot of artifacts at the same time (for example when merging a huge branch). Repo initialization will be done for each artifact, concurrently. Avoid useless initialization steps that triggers many graphql queries to fetch and update the repo status by leveraging Prefect caching with a small TTL. Signed-off-by: Fatih Acar <[email protected]>
Only those required by the artifact target parameters. AI Signed-off-by: Fatih Acar <[email protected]>
Signed-off-by: Fatih Acar <[email protected]>
93246c1    to
    ee7c390      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (1)
backend/infrahub/git/tasks.py (1)
327-397: Use the correct keyword when callingfetch_definition_targets.
fetch_definition_targetsexpects adefinition=argument. Passingartifact_definition=raisesTypeError: got an unexpected keyword argument 'artifact_definition'and aborts the flow before any artifacts are queued. Update the call to match the helper’s signature.- group = await fetch_definition_targets(client=client, branch=model.branch, artifact_definition=artifact_definition) + group = await fetch_definition_targets(client=client, branch=model.branch, definition=artifact_definition)
🧹 Nitpick comments (1)
backend/infrahub/artifacts/models.py (1)
28-29: Fix the query_id description.The new field still says “The name of the query…”, which contradicts the field’s intent and the deprecation notice above it. Please update the description so it clearly references the query ID.
- query_id: str = Field(..., description="The name of the query to use when collecting data") + query_id: str = Field(..., description="The id of the query to use when collecting data")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- poetry.lockis excluded by- !**/*.lock
📒 Files selected for processing (14)
- backend/infrahub/artifacts/models.py(1 hunks)
- backend/infrahub/computed_attribute/tasks.py(1 hunks)
- backend/infrahub/core/schema/manager.py(4 hunks)
- backend/infrahub/core/schema/schema_branch.py(1 hunks)
- backend/infrahub/generators/tasks.py(2 hunks)
- backend/infrahub/git/integrator.py(1 hunks)
- backend/infrahub/git/models.py(1 hunks)
- backend/infrahub/git/repository.py(3 hunks)
- backend/infrahub/git/tasks.py(6 hunks)
- backend/infrahub/git/utils.py(2 hunks)
- backend/infrahub/message_bus/types.py(1 hunks)
- backend/infrahub/proposed_change/tasks.py(7 hunks)
- backend/infrahub/workers/dependencies.py(2 hunks)
- pyproject.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/infrahub/workers/dependencies.py
- pyproject.toml
- backend/infrahub/computed_attribute/tasks.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
- backend/infrahub/message_bus/types.py
- backend/infrahub/core/schema/schema_branch.py
- backend/infrahub/git/repository.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/integrator.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/generators/tasks.py
- backend/infrahub/git/tasks.py
- backend/infrahub/artifacts/models.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/models.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
- backend/infrahub/message_bus/types.py
- backend/infrahub/core/schema/schema_branch.py
- backend/infrahub/git/repository.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/integrator.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/generators/tasks.py
- backend/infrahub/git/tasks.py
- backend/infrahub/artifacts/models.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/models.py
🧬 Code graph analysis (7)
backend/infrahub/core/schema/schema_branch.py (3)
backend/infrahub/core/schema/__init__.py (2)
duplicate(97-99)
get(65-72)backend/infrahub/core/schema/manager.py (1)
get(69-92)backend/infrahub/database/__init__.py (1)
get(84-88)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
get(14-35)
backend/infrahub/proposed_change/tasks.py (1)
backend/infrahub/git/utils.py (1)
fetch_definition_targets(123-153)
backend/infrahub/core/schema/manager.py (1)
backend/infrahub/core/schema/schema_branch.py (3)
get(303-329)
to_dict_api_schema_object(165-171)
get_hash(139-149)
backend/infrahub/generators/tasks.py (1)
backend/infrahub/git/utils.py (1)
fetch_definition_targets(123-153)
backend/infrahub/git/tasks.py (3)
backend/infrahub/git/utils.py (1)
fetch_definition_targets(123-153)backend/infrahub/workers/dependencies.py (1)
get_workflow(122-123)backend/infrahub/services/adapters/workflow/local.py (1)
submit_workflow(33-41)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
CoreArtifactDefinition(280-287)
CoreCheckDefinition(308-318)
CoreGroup(130-138)backend/infrahub/generators/models.py (1)
ProposedChangeGeneratorDefinition(34-44)backend/infrahub/core/schema/basenode_schema.py (2)
attribute_names(339-340)
relationship_names(343-344)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: validate-generated-documentation
- GitHub Check: backend-validate-generated
- GitHub Check: backend-tests-integration
- GitHub Check: E2E-testing-playwright
- GitHub Check: backend-tests-functional
- GitHub Check: backend-tests-unit
- GitHub Check: backend-benchmark
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: backend-docker-integration
- GitHub Check: Cloudflare Pages
        
          
                backend/infrahub/git/utils.py
              
                Outdated
          
        
      | keep_attributes = {field for field in required_fields if field in schema.attribute_names} | ||
| keep_relationships = {field for field in required_fields if field in schema.relationship_names} | ||
|  | ||
| exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes] | ||
| exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships] | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call schema name helpers before iterating.
schema.attribute_names / schema.relationship_names are callable helpers, not iterables. Using them without () raises TypeError: argument of type 'method' is not iterable the first time we hit a branch with members. Call the helpers before computing the include/exclude sets.
-        keep_attributes = {field for field in required_fields if field in schema.attribute_names}
-        keep_relationships = {field for field in required_fields if field in schema.relationship_names}
+        attribute_names = schema.attribute_names()
+        relationship_names = schema.relationship_names()
+        keep_attributes = {field for field in required_fields if field in attribute_names}
+        keep_relationships = {field for field in required_fields if field in relationship_names}
@@
-        exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes]
-        exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships]
+        exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes]
+        exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| keep_attributes = {field for field in required_fields if field in schema.attribute_names} | |
| keep_relationships = {field for field in required_fields if field in schema.relationship_names} | |
| exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes] | |
| exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships] | |
| attribute_names = schema.attribute_names() | |
| relationship_names = schema.relationship_names() | |
| keep_attributes = {field for field in required_fields if field in attribute_names} | |
| keep_relationships = {field for field in required_fields if field in relationship_names} | |
| exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes] | |
| exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships] | 
🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 98 to 103, schema.attribute_names
and schema.relationship_names are methods but are being treated as iterables;
call these helpers first (e.g., attribute_names = schema.attribute_names() and
relationship_names = schema.relationship_names()) and then use those returned
iterables when computing keep_attributes, keep_relationships,
exclude_attributes, and exclude_relationships so the comprehensions iterate over
the actual lists/sets rather than the method objects.
db50bd5    to
    72ca590      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
backend/infrahub/git/utils.py (1)
99-103: Call attribute_names() and relationship_names() methods.
schema.attribute_namesandschema.relationship_namesare methods, not properties. Using them without()will raiseTypeError: argument of type 'method' is not iterableat runtime.Based on the relevant code snippet at backend/infrahub/core/schema/basenode_schema.py (lines 338-343), these are methods that return lists.
Apply this diff:
+ attribute_names = schema.attribute_names() + relationship_names = schema.relationship_names() - keep_attributes = {field for field in required_fields if field in schema.attribute_names} - keep_relationships = {field for field in required_fields if field in schema.relationship_names} + keep_attributes = {field for field in required_fields if field in attribute_names} + keep_relationships = {field for field in required_fields if field in relationship_names} - exclude_attributes = [attr for attr in schema.attribute_names if attr not in keep_attributes] - exclude_relationships = [rel for rel in schema.relationship_names if rel not in keep_relationships] + exclude_attributes = [attr for attr in attribute_names if attr not in keep_attributes] + exclude_relationships = [rel for rel in relationship_names if rel not in keep_relationships]
🧹 Nitpick comments (1)
backend/infrahub/git/utils.py (1)
132-138: Consider defensive error handling for unexpected definition types.If an
InfrahubNodeis passed that doesn't matchARTIFACTDEFINITIONorCHECKDEFINITION,group_idandparametersremain uninitialized, causing anUnboundLocalErrorat line 140.While the type hint should prevent this at static analysis time, consider adding an explicit error for robustness:
if isinstance(definition, InfrahubNode): if definition.typename in [InfrahubKind.ARTIFACTDEFINITION, InfrahubKind.CHECKDEFINITION]: group_id = definition.targets.id parameters = definition.parameters.value else: raise ValueError(f"Unsupported InfrahubNode typename: {definition.typename}") elif isinstance(definition, ProposedChangeGeneratorDefinition): group_id = definition.group_id parameters = definition.parameters else: raise TypeError(f"Unsupported definition type: {type(definition)}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
- backend/infrahub/generators/tasks.py(2 hunks)
- backend/infrahub/git/tasks.py(6 hunks)
- backend/infrahub/git/utils.py(2 hunks)
- backend/infrahub/proposed_change/tasks.py(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/infrahub/generators/tasks.py
🧰 Additional context used
📓 Path-based instructions (2)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/tasks.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/tasks.py
🧬 Code graph analysis (3)
backend/infrahub/proposed_change/tasks.py (1)
backend/infrahub/git/utils.py (1)
fetch_definition_targets(124-155)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
CoreArtifactDefinition(280-287)
CoreCheckDefinition(308-318)
CoreGroup(130-138)backend/infrahub/generators/models.py (1)
ProposedChangeGeneratorDefinition(34-44)backend/infrahub/core/schema/basenode_schema.py (2)
attribute_names(339-340)
relationship_names(343-344)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (1)
fetch_definition_targets(124-155)backend/infrahub/services/adapters/workflow/local.py (1)
submit_workflow(33-41)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: validate-generated-documentation
- GitHub Check: E2E-testing-playwright
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: backend-tests-functional
- GitHub Check: backend-benchmark
- GitHub Check: backend-tests-integration
- GitHub Check: backend-tests-unit
- GitHub Check: backend-docker-integration
- GitHub Check: Cloudflare Pages
🔇 Additional comments (6)
backend/infrahub/git/tasks.py (3)
327-328: LGTM! Centralized target resolution.The refactor to use
fetch_definition_targetsconsolidates target group retrieval and enables prefetching of member nodes, reducing API calls and improving performance.
359-398: LGTM! Batch submission improves throughput.The batch-based workflow submission pattern shifts the bottleneck to Prefect API servers, enabling better scalability. The addition of
query_idat line 380 aligns with the model changes to support query ID-based lookups.
580-580: LGTM! Consistent use of centralized target resolution.backend/infrahub/proposed_change/tasks.py (3)
656-656: LGTM! Consistent use of centralized target resolution.
693-693: LGTM! Complete query_id propagation.The addition of
query_idtoCheckArtifactCreate(line 693), the GraphQL query (line 1243), and the parsing logic (line 1465) ensures consistent propagation of query identifiers throughout the artifact generation flow.Also applies to: 1243-1243, 1465-1465
920-922: LGTM! Centralized target resolution for generators.
| def _collect_parameter_first_segments(params: Any) -> set[str]: | ||
| segments: set[str] = set() | ||
|  | ||
| def _walk(value: Any) -> None: | ||
| if isinstance(value, str): | ||
| segment = value.split("__", 1)[0] | ||
| if segment: | ||
| segments.add(segment) | ||
| elif isinstance(value, dict): | ||
| for nested in value.values(): | ||
| _walk(nested) | ||
| elif isinstance(value, (list, tuple, set)): | ||
| for nested in value: | ||
| _walk(nested) | ||
|  | ||
| _walk(params) | ||
| return segments | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add docstrings to new utility functions.
All three new functions (_collect_parameter_first_segments, _prefetch_group_member_nodes, and fetch_definition_targets) lack docstrings.
As per coding guidelines, Python functions should include Google-style docstrings with:
- A brief one-line description
- An Args section documenting parameters (without types)
- A Returns section describing the return value
Example for fetch_definition_targets:
async def fetch_definition_targets(
    client: InfrahubClient,
    branch: str,
    definition: CoreArtifactDefinition | CoreCheckDefinition | ProposedChangeGeneratorDefinition,
) -> CoreGroup:
    """Fetch target group for a definition with prefetched member nodes.
    
    Retrieves the target group associated with the given artifact, check, or generator
    definition, and prefetches member nodes based on the parameters defined in the
    definition.
    
    Args:
        client: The Infrahub SDK client for API interactions
        branch: The branch name to fetch the group from
        definition: The definition object containing targets and parameters
        
    Returns:
        The target group with prefetched member nodes populated in the client store
    """Also applies to: 77-122, 124-155
🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 58-74, 77-122 and 124-155, the
three new functions (_collect_parameter_first_segments,
_prefetch_group_member_nodes, and fetch_definition_targets) are missing
docstrings; add Google-style docstrings to each function consisting of a
one-line summary, an Args section listing parameters (no types), and a Returns
section describing the return value (matching existing project style). For
fetch_definition_targets follow the provided example (brief summary, purpose,
Args: client, branch, definition, and Returns: the target group with prefetched
member nodes) and for the other two document their parameters and return
sets/structures similarly, ensuring consistency and placement immediately below
each def line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey coderabbit, where is this requirement for docstring coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Look for coding guidelines or contribution documentation
echo "=== Looking for coding guidelines documentation ==="
fd -t f -e md | xargs grep -l -i "docstring\|coding.*guideline\|style.*guide\|contribution" 2>/dev/null | head -10
echo -e "\n=== Checking for Python linter configurations ==="
# Check for ruff, pylint, or other linter configs
fd -t f "ruff.toml|pylint|.pylintrc|pyproject.toml" -x cat {}
echo -e "\n=== Sampling docstring usage in existing utility functions ==="
# Check a few existing Python files in backend/infrahub for docstring patterns
fd -e py -t f . backend/infrahub/core backend/infrahub/git | head -5 | xargs -I {} sh -c 'echo "=== File: {} ===" && head -50 {}'
Length of output: 32967
@ogenstad The requirement for docstrings is not enforced by your linter configuration. Looking at your pyproject.toml, I can see that:
ignore = [
    "D",        # pydocstyle
    "DOC",      # pydoclint
    ...
]Both pydocstyle (D) and pydoclint (DOC) rules are explicitly disabled.
However, I noticed that existing functions in the codebase often include docstrings. For example, the get_repositories_commit_per_branch function in the same file (backend/infrahub/git/utils.py) has a Google-style docstring:
async def get_repositories_commit_per_branch(
    db: InfrahubDatabase,
) -> dict[str, RepositoryData]:
    """Get a list of all repositories and their commit on each branches.
    This method is similar to 'get_list_repositories' method in the Python SDK.
    NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch
    """My suggestion was based on this existing pattern in the codebase as a best practice, not a hard requirement from your configuration.
If you'd prefer not to receive docstring suggestions for this project, I can remember that for future reviews.
🧬 Code Graph Analysis Results
- 
File: backend/infrahub/git/utils.py | Lines 1-28 - 
Full snippet: 
 async def get_repositories_commit_per_branch(
 db: InfrahubDatabase,
 ) -> dict[str, RepositoryData]:
 """Get a list of all repositories and their commit on each branches.This method is similar to 'get_list_repositories' method in the Python SDK. NOTE: At some point, we should refactor this function to use a single Database query instead of one per branch """ repositories: dict[str, RepositoryData] = {} for branch in list(registry.branch.values()): repos: list[CoreGenericRepository] = await NodeManager.query( db=db, branch=branch, fields={"id": None, "name": None, "commit": None, "internal_status": None}, schema=InfrahubKind.GENERICREPOSITORY, ) for repository in repos: repo_name = repository.name.value if repo_name not in repositories: repositories[repo_name] = RepositoryData( repository_id=repository.get_id(), repository_name=repo_name, branches={}, ) repositories[repo_name].branches[branch.name] = repository.commit.value # type: ignore[attr-defined] repositories[repo_name].branch_info[branch.name] = RepositoryBranchInfo( internal_status=repository.internal_status.value ) return repositories
 
- 
- 
File: backend/infrahub/core/manager.py | Lines 1-60 - Summary:
 Docstring for NodeManager.query describes its purpose and arguments. It states that the method:- Queries one or multiple nodes of a given type based on filter arguments.
- Accepts parameters including schema (NodeSchema or type), filters, fields, offset, limit, at, branch, include_source, include_owner, prefetch_relationships, account, partial_match, branch_agnostic, and order.
- Returns a list of Node objects (or a list of SchemaProtocol objects when the more specific overload is used).
- Includes behavior notes such as how branch resolution and HFID handling influence querying, and how results are formed.
- Includes a Python-style Args/Returns documentation block within the method docstring.
 
 
- Summary:
- 
File: backend/infrahub/core/manager.py | Lines 60-110 - Summary:
 Docstring for NodeManager.count describes its purpose, parameters, and return value for counting nodes that match filters. It notes:- Accepts db, schema (type[SchemaProtocol] | NodeSchema | GenericSchema | ProfileSchema | TemplateSchema | str), filters, at, branch, account, partial_match, branch_agnostic.
- Returns an int representing the number of responses found.
 
 
- Summary:
- 
File: backend/infrahub/core/manager.py | Lines 110-170 - Summary:
 Docstring for NodeManager.query_peers describes its purpose and parameters for querying peer nodes. It notes:- Accepts db, ids, source_kind, schema, filters, fields (optional), offset (optional), limit (optional), at (optional), branch (optional), branch_agnostic, fetch_peers.
- Returns a list of Relationship objects (or related node representations) depending on inputs.
- Includes behavior about how fields and HFID-related field selection are handled when requested.
 
 
- Summary:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving b/c it sounds like you need this, but I think whatever LLM wrote the functions in infrahub.git.utils did not do a good job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you add a fragment to include something in the release note
| name="Fetch repository commit", | ||
| description="Retrieve a git repository at a given commit, if it does not already exist locally", | ||
| cache_policy=NONE, | ||
| @cached( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker for this PR but I'm wondering if we could leverage prefect for the caching ? and what is the limitation if you already tried
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was done using the Prefect caching mechanism in the first iteration, but as described in the following commit: 02b289d
- local caching requires a lot of setup (wrt Blocks, etc)
- it's memory intensive due to excessive pickle serialization in our case (we don't need it)
Revert to basic local TTL cache.```
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically Prefect caching is useful for distributed caching but here we want a local, efficient cache.
Signed-off-by: Fatih Acar <[email protected]>
Rebuilding the SDK SchemaBranch structure on each initialization is memory intensive and leads to spike of memory usage. Avoid this by introducing a new cache attribute in the SchemaManager. Signed-off-by: Fatih Acar <[email protected]>
72ca590    to
    ce5b6b5      
    Compare
  
    Unfortunately we cannot leverage Prefect's caching because: - local caching requires a lot of setup (wrt Blocks, etc) - it's memory intensive due to excessive serialization in our case Revert to basic local TTL cache. Signed-off-by: Fatih Acar <[email protected]>
Small optimization to avoid unnecessary database query to find a graphql query by name. Signed-off-by: Fatih Acar <[email protected]>
Signed-off-by: Fatih Acar <[email protected]>
ce5b6b5    to
    2ce5840      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️  Outside diff range comments (4)
backend/infrahub/computed_attribute/tasks.py (2)
267-269: Fix typo in comment.- # NOTE we only need the id of the nodes, we need to ooptimize the query here + # NOTE we only need the id of the nodes; we should optimize the query here
106-112: Based on my comprehensive verification of the codebase and external sources, I can now provide a definitive assessment.The method
query_gql_query()does not exist in the Infrahub SDK or in this codebase—calling it will result in anAttributeErrorat runtime.Evidence:
- The Python SDK does not expose a method named query_gql_query; use execute_graphql / executeGraphQL to run raw GraphQL
- Repository-wide search found zero definitions of
query_gql_query(not in backend, not in bundled SDK, not monkeypatched anywhere)- Code calls
client.query_gql_query()andself.sdk.query_gql_query()despite the method not existing
python_sdkdirectory in the repository is empty (0 bytes)
InfrahubClientis imported from the officialinfrahub_sdk.clientpackage, which does not contain this methodThe code at lines 106–112 in
backend/infrahub/computed_attribute/tasks.py(and similar calls inbackend/infrahub/git/integrator.py) will fail immediately when executed. Replacequery_gql_query()with the correct SDK method (likelyexecute_graphql()based on the web search result) and adjust parameters accordingly.backend/infrahub/message_bus/types.py (1)
88-96: Avoid breaking change: make deprecated query_name optional and validate “one-of” with query_id.Both query_name and query_id are required here. That breaks producers still sending only the deprecated field. Make query_name optional; optionally accept query_id or query_name (at least one), and prefer query_id downstream.
class ProposedChangeArtifactDefinition(BaseModel): @@ - query_name: str # Deprecated - query_id: str + query_name: str | None = Field(default=None) # Deprecated + query_id: str | None = Field(default=None) @@ timeout: int + + @staticmethod + def _missing(b: str | None) -> bool: + return b is None or b == "" + + @model_validator(mode="after") + def _ensure_query_identifier(self) -> "ProposedChangeArtifactDefinition": + # Require at least one of query_id or query_name for backward compatibility + if self._missing(self.query_id) and self._missing(self.query_name): + raise ValueError("Either query_id or query_name must be provided") + return selfAdd import at top:
-from pydantic import BaseModel, Field +from pydantic import BaseModel, Field, model_validatorbackend/infrahub/git/models.py (1)
41-46: Avoid breaking change: accept query or query_id (prefer query_id) with validator.Same concern as above: both fields are required. Make the deprecated field optional and allow either; enforce “one-of”.
-from pydantic import BaseModel, ConfigDict, Field +from pydantic import BaseModel, ConfigDict, Field, model_validator @@ - query: str = Field(..., description="The name of the query to use when collecting data") # Deprecated - query_id: str = Field(..., description="The id of the query to use when collecting data") + query: str | None = Field(default=None, description="The name of the query to use when collecting data") # Deprecated + query_id: str | None = Field(default=None, description="The id of the query to use when collecting data") @@ context: InfrahubContext = Field(..., description="The context of the task") + + @model_validator(mode="after") + def _ensure_query_identifier(self) -> "RequestArtifactGenerate": + if not self.query_id and not self.query: + raise ValueError("Either query_id or query must be provided") + return self
♻️ Duplicate comments (1)
backend/infrahub/git/tasks.py (1)
359-399: Verify batching semantics for workflow submissions.client.create_batch is an SDK facility; passing get_workflow().submit_workflow may not benefit from SDK batching and could be a no-op or even serialize behind one transport. Please verify this supports arbitrary callables and actually runs them concurrently; otherwise prefer asyncio.gather around submit_workflow.
If unsupported, apply:
- batch = await client.create_batch() - ... - batch.add( - task=get_workflow().submit_workflow, - workflow=REQUEST_ARTIFACT_GENERATE, - context=context, - parameters={"model": request_artifact_generate_model}, - ) - async for _, _ in batch.execute(): - pass + import asyncio + coros = [] + for relationship in group.members.peers: + ... + coros.append( + get_workflow().submit_workflow( + workflow=REQUEST_ARTIFACT_GENERATE, + context=context, + parameters={"model": request_artifact_generate_model}, + ) + ) + await asyncio.gather(*coros)Run to confirm Batch.add accepts non-SDK callables and executes them concurrently:
#!/bin/bash # Inspect SDK batch API in repo for accepted task types and execution model rg -nP -C3 'class\s+Batch\b|def\s+create_batch\b|def\s+add\(' --type=py
🧹 Nitpick comments (5)
backend/infrahub/core/schema/schema_branch.py (1)
165-171: Add a short docstring and consider JSON/alias dump for API stability.This new public serializer looks good. To align with our Python docstring guidelines and avoid future alias/None drift in API payloads, add a docstring and switch to model_dump(mode="json", by_alias=True, exclude_none=True).
def to_dict_api_schema_object(self) -> dict[str, list[dict]]: - return { - "nodes": [self.get(name, duplicate=False).model_dump() for name in self.nodes], - "profiles": [self.get(name, duplicate=False).model_dump() for name in self.profiles], - "generics": [self.get(name, duplicate=False).model_dump() for name in self.generics], - "templates": [self.get(name, duplicate=False).model_dump() for name in self.templates], - } + """Return an API-friendly representation of this branch's schema. + + Returns: + Dict with keys: nodes, profiles, generics, templates. Each value is a list of Pydantic dumps. + """ + return { + "nodes": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.nodes], + "profiles": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.profiles], + "generics": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.generics], + "templates": [self.get(name, duplicate=False).model_dump(mode="json", by_alias=True, exclude_none=True) for name in self.templates], + }backend/infrahub/core/schema/manager.py (1)
770-791: Purge orphaned SDK cache entries when branches are removed.Currently, purge_inactive_branches deletes _branches and _branch_hash_by_name entries but leaves stale items in _sdk_branches (keyed by old hashes). Clean them up to bound memory and avoid serving stale SDK schemas.
for branch_name in list(self._branches.keys()): if branch_name not in active_branches: del self._branches[branch_name] if branch_name in self._branch_hash_by_name: del self._branch_hash_by_name[branch_name] removed_branches.append(branch_name) + # Drop SDK cache entries for purged branches + active_hashes = set(self._branch_hash_by_name.values()) + for key in list(self._sdk_branches.keys()): + if key not in active_hashes: + del self._sdk_branches[key]backend/infrahub/git/repository.py (1)
254-282: Make cache key resilient to positional calls and add docstrings.The cache key uses only kwargs; if someone inadvertently calls the private function positionally, the key may degenerate. Enforce keyword-only params and add docstrings per guidelines.
-@cached( +@cached( TTLCache(maxsize=100, ttl=30), key=lambda *_, **kwargs: hashkey( kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit") ), ) -async def _get_initialized_repo( - client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None -) -> InfrahubReadOnlyRepository | InfrahubRepository: +async def _get_initialized_repo( + *, client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None +) -> InfrahubReadOnlyRepository | InfrahubRepository: + """Create or retrieve a cached repository wrapper for a given repo/commit.""" @@ -async def get_initialized_repo( - client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None -) -> InfrahubReadOnlyRepository | InfrahubRepository: +async def get_initialized_repo( + client: InfrahubClient, repository_id: str, name: str, repository_kind: str, commit: str | None = None +) -> InfrahubReadOnlyRepository | InfrahubRepository: + """Prefect task wrapper: fetch a repository at a given commit (local cache, no Prefect caching).""" return await _get_initialized_repo( client=client, repository_id=repository_id, name=name, repository_kind=repository_kind, commit=commit )All current call sites pass keyword args (good). This change prevents accidental positional calls in the future.
backend/infrahub/git/tasks.py (1)
329-335: Iterate relationships explicitly for clarity.group.members.peers yields Relationship objects; consider naming the loop var relationship and access relationship.id or relationship.peer for readability and to avoid misuse. Example:
- current_members = [member.id for member in group.members.peers] + current_members = [rel.id for rel in group.members.peers] # relationship idsbackend/infrahub/git/utils.py (1)
58-75: Consider walking dict keys too when extracting parameter segments.Some parameter schemas encode field paths in keys (e.g., {"device__name": "r1"}). Current walker inspects only values. Optionally include keys:
- def _walk(value: Any) -> None: + def _walk(value: Any) -> None: if isinstance(value, str): segment = value.split("__", 1)[0] if segment: segments.add(segment) elif isinstance(value, dict): - for nested in value.values(): - _walk(nested) + for k, v in value.items(): + _walk(k) + _walk(v)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
- poetry.lockis excluded by- !**/*.lock
📒 Files selected for processing (16)
- CHANGELOG.md(1 hunks)
- backend/infrahub/artifacts/models.py(1 hunks)
- backend/infrahub/computed_attribute/tasks.py(1 hunks)
- backend/infrahub/core/schema/manager.py(4 hunks)
- backend/infrahub/core/schema/schema_branch.py(1 hunks)
- backend/infrahub/generators/tasks.py(2 hunks)
- backend/infrahub/git/integrator.py(1 hunks)
- backend/infrahub/git/models.py(1 hunks)
- backend/infrahub/git/repository.py(3 hunks)
- backend/infrahub/git/tasks.py(6 hunks)
- backend/infrahub/git/utils.py(2 hunks)
- backend/infrahub/message_bus/types.py(1 hunks)
- backend/infrahub/proposed_change/tasks.py(8 hunks)
- backend/infrahub/workers/dependencies.py(2 hunks)
- changelog/+785ca0cd.fixed.md(1 hunks)
- pyproject.toml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- backend/infrahub/artifacts/models.py
- backend/infrahub/git/integrator.py
- backend/infrahub/workers/dependencies.py
- pyproject.toml
- backend/infrahub/generators/tasks.py
- backend/infrahub/proposed_change/tasks.py
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
- backend/infrahub/computed_attribute/tasks.py
- backend/infrahub/message_bus/types.py
- backend/infrahub/git/tasks.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/git/models.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/repository.py
- backend/infrahub/core/schema/schema_branch.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
- backend/infrahub/computed_attribute/tasks.py
- backend/infrahub/message_bus/types.py
- backend/infrahub/git/tasks.py
- backend/infrahub/core/schema/manager.py
- backend/infrahub/git/models.py
- backend/infrahub/git/utils.py
- backend/infrahub/git/repository.py
- backend/infrahub/core/schema/schema_branch.py
**/*.{md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Lint Markdown/MDX files with markdownlint using
.markdownlint.yaml
Files:
- CHANGELOG.md
- changelog/+785ca0cd.fixed.md
🧬 Code graph analysis (5)
backend/infrahub/git/tasks.py (1)
backend/infrahub/git/utils.py (2)
fetch_artifact_definition_targets(143-150)
fetch_check_definition_targets(153-160)
backend/infrahub/core/schema/manager.py (2)
backend/infrahub/core/registry.py (2)
schema(79-83)
schema(86-87)backend/infrahub/core/schema/schema_branch.py (3)
get(303-329)
to_dict_api_schema_object(165-171)
get_hash(139-149)
backend/infrahub/git/utils.py (3)
backend/infrahub/core/protocols.py (3)
CoreArtifactDefinition(280-287)
CoreCheckDefinition(308-318)
CoreGroup(130-138)backend/infrahub/generators/models.py (1)
ProposedChangeGeneratorDefinition(34-44)backend/infrahub/core/schema/basenode_schema.py (2)
attribute_names(339-340)
relationship_names(343-344)
backend/infrahub/git/repository.py (1)
backend/infrahub/message_bus/operations/git/file.py (1)
get(14-35)
backend/infrahub/core/schema/schema_branch.py (4)
backend/infrahub/core/schema/manager.py (1)
get(69-92)backend/infrahub/database/__init__.py (1)
get(84-88)backend/infrahub/core/schema/__init__.py (2)
get(65-72)
duplicate(97-99)backend/infrahub/core/models.py (3)
duplicate(500-502)
nodes(61-63)
generics(66-68)
🪛 LanguageTool
changelog/+785ca0cd.fixed.md
[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: backend-benchmark
- GitHub Check: E2E-testing-playwright
- GitHub Check: backend-tests-functional
- GitHub Check: backend-tests-unit
- GitHub Check: backend-tests-integration
- GitHub Check: backend-docker-integration
- GitHub Check: Cloudflare Pages
🔇 Additional comments (4)
backend/infrahub/core/schema/manager.py (1)
151-160: Verify that the codebase can accept the hash scope limitation (profiles/templates excluded from SDK cache key), or implement get_hash_full() approach.The review identifies a real issue: line 152 can raise
KeyErrorifget_sdk_schema_branch()is called before a branch is initialized. The suggested fix is sound—callingget_schema_branch()first ensures both the branch and its hash entry exist, sinceset_schema_branch()atomically sets both at lines 163–164.However, the review also raises a valid concern about hash scope: the cache key uses
get_hash()which only incorporates nodes and generics (line 140), not profiles or templates (line 155). If profiles or templates change independently, the SDK cache will become stale. The suggested diff still usesget_hash()and leaves this unresolved, though the caveat acknowledges it.Consider:
- Accept this limitation (profiles/templates changes are rare or externally managed)
- Or adopt the suggested approach of using
get_hash_full()or including profiles/templates in the cache key derivationbackend/infrahub/git/tasks.py (2)
327-328: Good switch to prefetched targets.Using fetch_artifact_definition_targets centralizes logic and avoids per-item fetches.
379-381: Propagating query_id is correct.Including query_id alongside query reduces lookups and aligns with the new models.
backend/infrahub/git/utils.py (1)
119-131: Ensure members.peers are populated before prefetch.include=["members"] may not hydrate relationship peers in some SDK versions. If peers are empty, fetch them to get ids/kinds.
group = await client.get( kind=CoreGroup, id=group_id, branch=branch, include=["members"], ) + if not group.members.peers: + await group.members.fetch()
| async def _prefetch_group_member_nodes( | ||
| client: InfrahubClient, | ||
| members: RelationshipManager, | ||
| branch: str, | ||
| required_fields: set[str], | ||
| ) -> None: | ||
| ids_per_kind: dict[str, set[str]] = defaultdict(set) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Add Google‑style docstrings to new utilities.
Per repo guidelines, add docstrings to _prefetch_group_member_nodes, fetch_definition_targets, and public fetch* helpers.
Minimal examples:
 async def _prefetch_group_member_nodes(...):
+    """Batch‑prefetch group member nodes for required fields.
+
+    Args:
+        client: SDK client.
+        members: Relationship manager for group.members.
+        branch: Branch to query.
+        required_fields: Top‑level attribute/relationship names needed.
+
+    Returns:
+        None. Populates the client store with fetched nodes.
+    """
@@
 async def _fetch_definition_targets(...)-> CoreGroup:
+    """Fetch target group and prefetch members based on definition parameters."""
@@
 async def fetch_artifact_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for an artifact definition."""
@@
 async def fetch_check_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for a check definition."""
@@
 async def fetch_proposed_change_generator_definition_targets(...)-> CoreGroup:
+    """Public helper to resolve targets for a proposed change generator definition."""Also applies to: 119-141, 143-170
| for peer in members.peers: | ||
| if peer.id and peer.typename: | ||
| ids_per_kind[peer.typename].add(peer.id) | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrong entity from RelationshipManager.peers; use relationship.peer.
peers yields Relationship objects. Access the node via relationship.peer; otherwise peer.typename may not exist and ids_per_kind stays empty.
-    for peer in members.peers:
-        if peer.id and peer.typename:
-            ids_per_kind[peer.typename].add(peer.id)
+    for relationship in members.peers:
+        node = relationship.peer
+        if node.id and node.typename:
+            ids_per_kind[node.typename].add(node.id)🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 84 to 87, the loop is iterating
Relationship objects from members.peers but treats them as if they were nodes;
change the loop to treat each item as a relationship, retrieve the node via
relationship.peer, then check that that peer has id and typename before adding
to ids_per_kind (e.g., for relationship in members.peers: peer =
relationship.peer; if peer and peer.id and peer.typename:
ids_per_kind[peer.typename].add(peer.id)). Ensure you don't assume peers yields
raw nodes and guard against missing peer attributes.
        
          
                backend/infrahub/git/utils.py
              
                Outdated
          
        
      | schema = await client.schema.get(kind=kind, branch=branch) | ||
|  | ||
| # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205 | ||
| valid_fields = set(schema.attribute_names) | set(schema.relationship_names) | ||
| keep_relationships = set(schema.relationship_names) & required_fields | ||
| cleaned_fields = required_fields ^ valid_fields | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call schema helpers, fix exclude set, and guard empty required_fields.
- attribute_names()/relationship_names() are methods; call them.
- Use difference (valid_fields - required_fields) for exclude to avoid unknown keys from XOR.
- Skip prefetch when required_fields is empty to avoid requesting “nothing”.
-        # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
-        valid_fields = set(schema.attribute_names) | set(schema.relationship_names)
-        keep_relationships = set(schema.relationship_names) & required_fields
-        cleaned_fields = required_fields ^ valid_fields
+        # FIXME: https://github.com/opsmill/infrahub-sdk-python/pull/205
+        attribute_names = set(schema.attribute_names())
+        relationship_names = set(schema.relationship_names())
+        valid_fields = attribute_names | relationship_names
+        keep_relationships = relationship_names & required_fields
+        cleaned_fields = valid_fields - required_fields
@@
-        kwargs: dict[str, Any] = {
+        kwargs: dict[str, Any] = {
             "kind": kind,
             "ids": ids,
             "branch": branch,
             "exclude": cleaned_fields,
             "populate_store": True,
             "order": Order(disable=True),
         }
@@
-    parameter_fields = _collect_parameter_first_segments(parameters)
-    await _prefetch_group_member_nodes(
-        client=client,
-        members=group.members,
-        branch=branch,
-        required_fields=parameter_fields,
-    )
+    parameter_fields = _collect_parameter_first_segments(parameters)
+    if parameter_fields:
+        await _prefetch_group_member_nodes(
+            client=client,
+            members=group.members,
+            branch=branch,
+            required_fields=parameter_fields,
+        )Also applies to: 101-114, 132-139
🤖 Prompt for AI Agents
In backend/infrahub/git/utils.py around lines 94-99, 101-114, and 132-139, fix
three issues: call schema.attribute_names() and schema.relationship_names()
(they are methods), compute the exclude set as valid_fields - required_fields
instead of symmetric difference (required_fields ^ valid_fields), and skip the
prefetch/prefetch-related logic when required_fields is empty to avoid
requesting nothing; update any keep_relationships calculations to use
set(schema.relationship_names()) & required_fields accordingly and apply these
same changes in the other noted ranges.
        
          
                CHANGELOG.md
              
                Outdated
          
        
      | ## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17 | ||
|  | ||
| ### Added | ||
|  | ||
| - The login form now automatically focuses on the first field. | ||
|  | ||
| ### Fixed | ||
|  | ||
| - - Consistent font size for all events in the Proposed Change timeline | ||
| - Proposed Change action buttons now keep their size and does not strectch anymore | ||
| - Prevent overflow on the create new relationship button within the relationship input | ||
| - fixed typos | ||
|  | ||
| ([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780)) | ||
| - - Improved logging for SSO authentication to provide better debugging information | ||
| - Enhanced error handling to properly support all error codes returned by identity providers | ||
|  | ||
| ([#6969](https://github.com/opsmill/infrahub/issues/6969)) | ||
| - - Correctly display XML and CSV artifacts in the UI. | ||
| - Added a fallback to plain text for unsupported content types. | ||
|  | ||
| ([#7294](https://github.com/opsmill/infrahub/issues/7294)) | ||
| - Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch, | ||
| which would cause the application and workers to crash with an error message similar to the following: | ||
|  | ||
| > ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2'] | ||
| Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346)) | ||
| - Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372)) | ||
| - Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388)) | ||
| - Corrected the labels on the branch list and detailed view to use the correct terminology | ||
| - Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases. | ||
|  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Duplicate version header and typos.
- Two entries named v1.4.10 (2025-10-17 and 2025-10-01). Please bump the new entry’s version (e.g., v1.4.11) or merge into the existing 1.4.10 section.
- Fix typos: “strectch” → “stretch”.
-## [Infrahub - v1.4.10] ... - 2025-10-17
+## [Infrahub - v1.4.11] ... - 2025-10-17  # or merge into 1.4.10
@@
-  - Proposed Change action buttons now keep their size and does not strectch anymore
+  - Proposed Change action buttons now keep their size and do not stretch anymore🏁 Script executed:
#!/bin/bash
# Check for duplicate v1.4.10 entries and find the typo
echo "=== All v1.4.10 entries in CHANGELOG.md ==="
rg -n "v1\.4\.10" CHANGELOG.md
echo ""
echo "=== Line with 'strectch' typo ==="
rg -n "strectch" CHANGELOG.mdLength of output: 364
Duplicate version header and typos confirmed—resolve version conflict and fix typos.
Verification confirms:
- Two v1.4.10 entries exist (lines 14 and 47 with dates 2025-10-17 and 2025-10-01 respectively)
- Typo "strectch" on line 23
- Grammar error "does not" should be "do not" (plural subject "buttons")
Resolve by either bumping the newer entry to v1.4.11 or merging both entries into a single v1.4.10 section, and apply the typo/grammar fixes:
-  - Proposed Change action buttons now keep their size and does not strectch anymore
+  - Proposed Change action buttons now keep their size and do not stretch anymore📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17 | |
| ### Added | |
| - The login form now automatically focuses on the first field. | |
| ### Fixed | |
| - - Consistent font size for all events in the Proposed Change timeline | |
| - Proposed Change action buttons now keep their size and does not strectch anymore | |
| - Prevent overflow on the create new relationship button within the relationship input | |
| - fixed typos | |
| ([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780)) | |
| - - Improved logging for SSO authentication to provide better debugging information | |
| - Enhanced error handling to properly support all error codes returned by identity providers | |
| ([#6969](https://github.com/opsmill/infrahub/issues/6969)) | |
| - - Correctly display XML and CSV artifacts in the UI. | |
| - Added a fallback to plain text for unsupported content types. | |
| ([#7294](https://github.com/opsmill/infrahub/issues/7294)) | |
| - Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch, | |
| which would cause the application and workers to crash with an error message similar to the following: | |
| > ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2'] | |
| Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346)) | |
| - Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372)) | |
| - Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388)) | |
| - Corrected the labels on the branch list and detailed view to use the correct terminology | |
| - Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases. | |
| ## [Infrahub - v1.4.10](https://github.com/opsmill/infrahub/tree/infrahub-v1.4.10) - 2025-10-17 | |
| ### Added | |
| - The login form now automatically focuses on the first field. | |
| ### Fixed | |
| - - Consistent font size for all events in the Proposed Change timeline | |
| - Proposed Change action buttons now keep their size and do not stretch anymore | |
| - Prevent overflow on the create new relationship button within the relationship input | |
| - fixed typos | |
| ([#IFC-1780](https://github.com/opsmill/infrahub/issues/IFC-1780)) | |
| - - Improved logging for SSO authentication to provide better debugging information | |
| - Enhanced error handling to properly support all error codes returned by identity providers | |
| ([#6969](https://github.com/opsmill/infrahub/issues/6969)) | |
| - - Correctly display XML and CSV artifacts in the UI. | |
| - Added a fallback to plain text for unsupported content types. | |
| ([#7294](https://github.com/opsmill/infrahub/issues/7294)) | |
| - Fix a bug that allowed duplicate attributes and/or relationships on Node or Generic schemas to be merged into the default branch, | |
| which would cause the application and workers to crash with an error message similar to the following: | |
| > ValueError: SchemaName: Names of attributes and relationships must be unique : ['field_name_1', 'field_name_2'] | |
| Added a new CLI command `infrahub db check-duplicate-schema-fields` to resolve this duplicated schema fields issue if it appears. ([#7346](https://github.com/opsmill/infrahub/issues/7346)) | |
| - Fixed an issue where boolean fields in the object Details view always displayed a checkmark, even when the value was false. ([#7372](https://github.com/opsmill/infrahub/issues/7372)) | |
| - Fixed prefix utilization showing as greater than 100% after setting the pool attribute to false ([#7388](https://github.com/opsmill/infrahub/issues/7388)) | |
| - Corrected the labels on the branch list and detailed view to use the correct terminology | |
| - Fixed issue with number pool popover stuck in the top-left corner and not expandable during the initial render in some cases. | 
🤖 Prompt for AI Agents
In CHANGELOG.md around lines 14 to 46 there is a duplicate v1.4.10 header (also
present at line ~47) and two text issues: "strectch" should be "stretch" and
"does not" should be "do not" for the plural "buttons"; resolve the duplication
by either renaming the newer entry to "Infrahub - v1.4.11" with the 2025-10-17
date or by merging the two v1.4.10 blocks into one consolidated v1.4.10 entry
(remove the duplicate block), then apply the typo fixes ("stretch") and grammar
fix ("do not") and ensure references/issue links remain intact.
| @@ -0,0 +1 @@ | |||
| Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tighten phrasing.
Suggest active voice and remove “excessive”:
-Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries.
+Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Improved artifacts generation and proposed change checks performance by leveraging caching and avoiding excessive GraphQL queries. | |
| Improve artifact generation and proposed change checks by leveraging caching and reducing GraphQL queries. | 
🧰 Tools
🪛 LanguageTool
[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.
(QB_NEW_EN)
🤖 Prompt for AI Agents
In changelog/+785ca0cd.fixed.md around line 1, the sentence uses passive
phrasing and the word “excessive”; change it to active voice and remove
“excessive.” Replace the line with a concise active version such as: "Improved
artifacts generation and sped up proposed change checks by leveraging caching
and reducing GraphQL queries."
2ce5840    to
    072eece      
    Compare
  
    Signed-off-by: Fatih Acar <[email protected]>
Signed-off-by: Fatih Acar <[email protected]>
072eece    to
    f9bde88      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
backend/infrahub/git/utils.py (2)
97-99: Call schema methods to retrieve field names.
schema.attribute_namesandschema.relationship_namesare methods that must be called with()to return the actual lists of field names.Apply this diff:
- valid_fields = set(schema.attribute_names) | set(schema.relationship_names) - keep_relationships = set(schema.relationship_names) & required_fields + attribute_names = set(schema.attribute_names()) + relationship_names = set(schema.relationship_names()) + valid_fields = attribute_names | relationship_names + keep_relationships = relationship_names & required_fields
84-86: Fix peers iteration to access node attributes correctly.
members.peersyieldsRelationshipobjects, not nodes directly. You need to accessrelationship.peerto get the actual node before checkingidandtypename.Apply this diff:
- for peer in members.peers: - if peer.id and peer.typename: - ids_per_kind[peer.typename].add(peer.id) + for relationship in members.peers: + node = relationship.peer + if node and node.id and node.typename: + ids_per_kind[node.typename].add(node.id)
🧹 Nitpick comments (4)
backend/infrahub/git/utils.py (4)
58-74: Consider adding a docstring.Per coding guidelines, Python functions should include Google-style docstrings. This helper would benefit from a brief description of its purpose and parameters.
Example:
def _collect_parameter_first_segments(params: Any) -> set[str]: """Extract top-level field names from parameter structures. Recursively walks nested parameter structures (dicts, lists, tuples, sets) and extracts the first segment (before '__') from string values. Args: params: Parameter structure to walk (can be dict, list, or scalar) Returns: Set of top-level field names referenced in the parameters """
77-116: Consider adding a docstring.Per coding guidelines, this async function should have a Google-style docstring documenting its purpose, parameters, and behavior.
Example:
async def _prefetch_group_member_nodes( client: InfrahubClient, members: RelationshipManager, branch: str, required_fields: set[str], ) -> None: """Batch-prefetch group member nodes with required fields. Groups member nodes by kind, computes minimal field sets based on required_fields, and uses batched client requests to populate the client store efficiently. Args: client: SDK client for API interactions members: RelationshipManager for group.members branch: Branch name to query required_fields: Top-level attribute/relationship names needed Returns: None. Populates the client store with fetched nodes. """
119-140: Consider adding a docstring.This core utility function would benefit from a Google-style docstring explaining its purpose and flow.
Example:
async def _fetch_definition_targets( client: InfrahubClient, branch: str, group_id: str, parameters: Any, ) -> CoreGroup: """Fetch target group and prefetch member nodes based on definition parameters. Retrieves the target group by ID, analyzes the definition parameters to determine required fields, and prefetches member nodes with those fields populated in the client store. Args: client: SDK client for API interactions branch: Branch name to query group_id: ID of the target group parameters: Definition parameters used to determine required fields Returns: The target group with prefetched member nodes """
143-170: Consider adding docstrings to public functions.These public API functions should have Google-style docstrings per coding guidelines. Brief examples:
async def fetch_artifact_definition_targets( client: InfrahubClient, branch: str, definition: CoreArtifactDefinition, ) -> CoreGroup: """Fetch target group for an artifact definition with prefetched members. Args: client: SDK client for API interactions branch: Branch name to query definition: Artifact definition containing targets and parameters Returns: Target group with prefetched member nodes """Similar docstrings for
fetch_check_definition_targetsandfetch_proposed_change_generator_definition_targets.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
- backend/infrahub/generators/tasks.py(2 hunks)
- backend/infrahub/git/tasks.py(6 hunks)
- backend/infrahub/git/utils.py(2 hunks)
- backend/infrahub/proposed_change/tasks.py(8 hunks)
- changelog/+785ca0cd.fixed.md(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
backend/**/*
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run backend tests with
pytestor viainvoketasks
Files:
- backend/infrahub/git/tasks.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/generators/tasks.py
- backend/infrahub/git/utils.py
**/*.py
📄 CodeRabbit inference engine (.github/instructions/python-docstring.instructions.md)
**/*.py: Use triple double quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include a brief one-line description at the top of each docstring
Add a detailed description section when additional context is needed
Document function/method parameters under an Args/Parameters section without typing information
Include a Returns section describing the return value
Include a Raises section listing possible exceptions
Provide an Examples section demonstrating usage when helpfulUse ruff and mypy to validate and lint Python files
**/*.py: Use type hints for all Python function parameters and return values
Prefer asynchronous code in Python when feasible
Define asynchronous Python functions withasync def
Useawaitfor asynchronous calls in Python
Use Pydantic models instead of standard dataclasses
Use ruff and mypy for linting and type checking
**/*.py: Use type hints for all Python function parameters and return values
Use async/await whenever possible in Python code
Define asynchronous functions withasync def
Await asynchronous calls withawait
Use Pydantic models instead of standard dataclasses for data modeling
Use triple quotes (""") for all Python docstrings
Write docstrings in Google-style format
Include docstring sections when applicable: one-line summary, optional details, Args (without types), Returns, Raises, Examples
Validate and lint Python with ruff and mypy
Files:
- backend/infrahub/git/tasks.py
- backend/infrahub/proposed_change/tasks.py
- backend/infrahub/generators/tasks.py
- backend/infrahub/git/utils.py
**/*.{md,mdx}
📄 CodeRabbit inference engine (CLAUDE.md)
Lint Markdown/MDX files with markdownlint using
.markdownlint.yaml
Files:
- changelog/+785ca0cd.fixed.md
🧬 Code graph analysis (4)
backend/infrahub/git/tasks.py (2)
backend/infrahub/git/utils.py (2)
fetch_artifact_definition_targets(143-150)
fetch_check_definition_targets(153-160)backend/infrahub/services/adapters/workflow/local.py (1)
submit_workflow(33-41)
backend/infrahub/proposed_change/tasks.py (2)
backend/infrahub/core/protocols.py (1)
CoreArtifactDefinition(280-287)backend/infrahub/git/utils.py (2)
fetch_artifact_definition_targets(143-150)
fetch_proposed_change_generator_definition_targets(163-170)
backend/infrahub/generators/tasks.py (1)
backend/infrahub/git/utils.py (1)
fetch_proposed_change_generator_definition_targets(163-170)
backend/infrahub/git/utils.py (4)
backend/infrahub/core/relationship/model.py (1)
RelationshipManager(718-1242)backend/infrahub/core/protocols.py (3)
CoreArtifactDefinition(280-287)
CoreCheckDefinition(308-318)
CoreGroup(130-138)backend/infrahub/generators/models.py (1)
ProposedChangeGeneratorDefinition(34-44)backend/infrahub/core/schema/basenode_schema.py (2)
attribute_names(339-340)
relationship_names(343-344)
🪛 LanguageTool
changelog/+785ca0cd.fixed.md
[grammar] ~1-~1: There might be a mistake here.
Context: ... and avoiding excessive GraphQL queries.
(QB_NEW_EN)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: E2E-testing-version-upgrade / From 1.3.0
- GitHub Check: backend-benchmark
- GitHub Check: E2E-testing-playwright
- GitHub Check: E2E-testing-invoke-demo-start
- GitHub Check: validate-generated-documentation
- GitHub Check: backend-tests-functional
- GitHub Check: documentation
- GitHub Check: backend-tests-integration
- GitHub Check: backend-tests-unit
- GitHub Check: backend-docker-integration
🔇 Additional comments (11)
backend/infrahub/generators/tasks.py (2)
24-24: LGTM!The import is correctly added to support the new fetch utility function used below.
181-183: LGTM!The switch to the prefetch utility function is correct and will improve performance by batching member node fetches based on definition parameters.
backend/infrahub/proposed_change/tasks.py (5)
47-47: LGTM!The new imports support the updated artifact definition handling and prefetch utilities used throughout the file.
Also applies to: 61-61
616-616: LGTM!Using the protocol type
CoreArtifactDefinitionprovides better type safety, and the fetch utility efficiently prefetches member nodes based on the definition's parameters.Also applies to: 656-658
695-695: LGTM!Adding
query_idenables artifact generation to use the query identifier directly, reducing the need for name-based lookups and improving performance.
922-924: LGTM!Consistent use of the prefetch utility for generator definitions improves performance through batched member node retrieval.
1245-1245: LGTM!Adding the query
idfield to the GraphQL query and parsing it intoquery_idsupports the migration from name-based to ID-based query references throughout the artifact generation flow.Also applies to: 1467-1467
backend/infrahub/git/tasks.py (4)
56-56: LGTM!The import provides the prefetch utilities used throughout the file for efficient target resolution.
327-328: LGTM!The prefetch utility efficiently resolves artifact definition targets and pre-populates member nodes in the client store based on parameter requirements.
359-398: LGTM!The batch submission pattern correctly groups artifact generation requests to reduce Prefect API overhead. Including
query_id(line 380) supports the shift from name-based to ID-based query references, which should reduce database lookups during artifact generation.
580-580: LGTM!Consistent use of the prefetch utility for check definitions ensures member nodes are efficiently fetched before iteration.
| TTLCache(maxsize=100, ttl=30), | ||
| key=lambda *_, **kwargs: hashkey( | ||
| kwargs.get("repository_id"), kwargs.get("name"), kwargs.get("repository_kind"), kwargs.get("commit") | ||
| ), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the TTL it might not have a practical meaning now, but given that commit can be None I think we might occasionally end up with a repo using an earlier commit that we would have expected. In another iteration we could instead cache after we have the commit. Repository_id and commit is probably enough to use for the key.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TTL is lower than the git sync scheduled job on purpose because I'd expect this kind of edge case... I think it's fine like that for the moment.
+1 for the key.
| return repositories | ||
|  | ||
|  | ||
| def _collect_parameter_first_segments(params: Any) -> set[str]: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an Any here looks like we expect str | list | tuple | set | dict and won't use anything else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LLM used Any here because we pass parameters straight from the JSONAttribute protocol where value is typed as Any. This looks legit?
| def _collect_parameter_first_segments(params: Any) -> set[str]: | ||
| segments: set[str] = set() | ||
|  | ||
| def _walk(value: Any) -> None: | ||
| if isinstance(value, str): | ||
| segment = value.split("__", 1)[0] | ||
| if segment: | ||
| segments.add(segment) | ||
| elif isinstance(value, dict): | ||
| for nested in value.values(): | ||
| _walk(nested) | ||
| elif isinstance(value, (list, tuple, set)): | ||
| for nested in value: | ||
| _walk(nested) | ||
|  | ||
| _walk(params) | ||
| return segments | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey coderabbit, where is this requirement for docstring coming from?
Many small changes here:
We are now able to generate 1000 artifacts under 2 minutes without errors, whereas it would take more than five minutes with failures.
Summary by CodeRabbit
Release Notes
Performance Improvements
Changes
query_idinstead for future compatibility.